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(54) A method of creating 3-D facial models starting from face images 



(57) The method allows the creation of 3-D facial 
models, which can be used, for instance, for the avatar 
implementation, video-communication applications, 
video games, video productions, and for the creation of 
advanced man-machine interfaces. At least one image 
of a human face is provided together with a 3D facial 
model (M) having a vertex structure and comprising a 
number of surfaces chosen within the set formed by a 
face surface (V), surfaces of the right eye (OD) and left 

2A 



eye (OS), respectively, and surfaces of the upper teeth 
(DS) and lower teeth (Dl), respectively. Among the ver- 
tices of the structure of the model (M) and on such at 
least one face image, respective sets of homologous 
points are chosen. The model structure (M) is then 
modified in such a way that the above respective sets of 
homologous points are made to coincide. 
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Description 

[0001] This invention concerns the technique for the creation of 3-D facial models, which can be used for instance 
for the implementation of so-called avatars (anthropomorphous models) to be used in vi?tual environments video-com- 
munication applications, video games. TV productions, and creation of advanced man-machine interfaces 
[0002] There are already some known technical solutions for the creation of a 3D model starting from the ohoto- 
graph of a person's face. 

[0003] On this subject matter, reference can be made for instance to the product Character Creator of company 
Darwin 3D (see Internet site http:// www,rtarwi n 3d com) as well as to the product Avatar Maker of company Sven Tech- 
nologies (see Internet site http://www,sven-tec com). The product "Character Creator is based on the choice of a basic 
model resembling the photographed person. The face of the photograph is framed by an ellipse and the program uses 
what lies within the ellipse as a texture of the model. In the product "Avatar Maker" a dozen of points are marked on the 
face, and a basic model is then chosen to which the photograph texture is associated. 

[0004] The main drawback of such known embodiments is that the structure of the generated model does not allow 
a subsequent animation. This is due to the fact that the model (usually generated as a "wire frame" model i e starting 
from a mesh structure, as will also be seen in the sequel), cannot exactly fit the profile in the mouth region thus pre- 
venting reproduction of lip movements. This also applies to other significant parts of the face, such as eyes and nose 
[0005] This invention aims at providing a method which allows the creation of facial models that can appear realistic 
both in static conditions and in animation conditions, in particular for instance as far as the opening and closing of eye- 
20 hds and the possibility of simulating eye rotation are concerned. 

[0006] According to the invention, this aim is attained through a method having the characteristics specifically men- 
tioned in the appended claims. 

[0007] Substantially the method according to the invention is based on the adaptation of a basic model of a face - 
typically a human face - having the physiognomy characteristics of the photographed person. The basic model (or "tem- 
25 plate ) is represented by a structure, preferably of the type called "wire frame", formed by a plurality of surfaces chosen 
out of a set of five surfaces, namely: 

face 

right eye and left eye, and 
30 - upper teeth and lower teeth 

[0008] The eye surfaces are separated from those of the face so as to allow, among other things, creation of open- 
ing and closing movements of eyelids, and a slight translation simulating the actual eye rotation. Similarly, it is possible 
to perform the animation of the model, as far as the speech is concerned, through the animation of the surfaces repre- 
ss senting the upper and lower teeth. 

[ «°°!? „ _ The invention wi " be now described by way of a non-limiting example, with reference to the drawings 
attached hereto, in which: 

- Figures 1A and 1B represent the typical look of the models used in the embodiment of the invention, represented 
40 m the wire frame mode (Figure 2A) and in the solid mode (Figure 2B), respectively 

" Tj 91 " 6 o A repr6SentS the Same model as shown in Fi 9 ure 1 ^ rear view, also in this case both in the wire frame mode 
(figure 2A) and in the solid mode (Figure 2B), 

' F j 9 t u re f,^i° 31 represent a set of tables which ide "% feature points of a face according to the present state 
ot the MPEG-4 standard, which face can be used for the embodiment of the invention, 
Figure 4 schematically shows one of the phases of the method according to the invention, 
Figure 5 schematically shows another phase of the method according to the invention 

Figure 6 depicts, in three parts denoted by 6A, 6B and 6C respectively, the evolution of the model within the method 
according to the invention, 

Figure 7, which also comprises three parts, represents in part 7A a photograph highlighting the feature points used 
for the calibration in a possible embodiment of the method according to the invention, and in parts 7B and 7C two 
views of the resulting model, complete with texture, 

Figure 8 depicts, in the form of a block diagram, the structure of a system which can be used for carrying out the 
invention, 



45 



50 



55 



- Figure 9 is a flow chart concerning a possible embodiment of the method according to the invention 

- Figures 1 0 and 1 1 exemplify the application of a so-called texture within the present invention. 

[0010] Figures 1 and 2 show a basic model M of human face, which can be used in a possible embodiment of the 
invention. Model M is here represented both in the wire frame mode and in the solid mode. The latter differs from the 



2 



EP 0 991 023 A2 




wire frame essentially by the background painting of the triangles of the wire frame. The model M here represented is 
formed by five surfaces, namely: 

face V, formed - in the embodiment 1 illustrated herein - by 360 vertices and 660 triangles, the term "vertex" being 
5 used in its geometrical meaning, i.e. vertex of an angle, 

right eye OD and left eye OS, each consisting of 26 vertices and 37 triangles, 
upper teeth DS and lower teeth Dl, each consisting of 70 vertices and 42 triangles. 

[001 1] It will be appreciated in particular that model M is a hollow structure, which may practically be assimilated to 
w a sort of mask, the shape of which is designed to reproduce the features of the modelled face. Of course, though cor- 
responding to an embodiment of the invention being preferred at present, the number of vertices and triangles to which 
reference has been previously made has a merely exemplary character and must in no case be regarded as a limitation 
case of the scope of the invention. 

[0012] These considerations also apply to the choice of using five different surfaces to implement the basic model. 

15 As a matter of fact, the number of such surfaces might be smaller (for the implementation of simpler models) or larger 
(for the implementation of more detailed and sophisticated models), depending on the application requirements. The 
important feature is the choice of using, as the basic model, a model comprising a plurality of surfaces and in particular 
surfaces that, depending on the type of face to be modelled (for instance a human face), correspond to shapes which 
are substantially known in general terms and have a relative arrangement, which as a whole, also is already known. 

20 [001 3] As a matter of fact, although the typology of the human face is practically infinite, it is known that the surface 
of the face has a general bowl-like look, that the eyelids have generally just a "eyelid" surface, which is at least margin- 
ally convex, that the dental arches have an arc shape, etc. It is then known that the eyelids are located in the medium- 
upper region of the face surface, whereas the teeth surfaces are located in the lower region. 

[0014] Furthermore, the fact of using distinct surfaces for the creation of the model allows applying to the model 
25 separation conditions, as those which make it possible to avoid, for instance, the interference of the teeth surfaces, so 
as to accurately model the congruency effect of the dental arches. 

[0015] This characteristic might be even better appreciated in the rear views of figure 2. 
[0016] The method according to the invention is substantially based on the solution of: 

30 - taking an image (typically a front photograph) of the face to be modelled, and 

modifying the model or template through a series of geometric transformations so that its projection coincides with 
a set of points identified on the photograph assumed as a starting image. 

[0017] For this adaptation, use is made of respective sets of points which have been chosen in correspondence 
35 with as many so called "feature points": such points are defined in the section "Face and body animation" of the ISO/IEC 
standard 14496-2 (MPEG-4) and are represented in figures 3A to 3H. 

[0018] In particular, in an embodiment of the invention being preferred at present, the method according to the 
invention is implemented by using the feature points identified in the MPEG-4 standard (as defined at the filing date of 
this invention) by the following indexes: 11.4, 2.1, 10.9, 10.10, 8.4, 8.1, 8.3, 8.2, 2.2, 2.3, 9.3, 9.2, 9.1, 4.1, 3.12, 3.8, 
40 3.10, 3.14, 3.11, 3.13, 3.7, and 3.9. Each of such indexes corresponds with a vertex of the model structure. 

[0019] Figure 4 synthesises the method according to the invention, so as this can be performed through the system 
shown in figure 8. 

[0020] Such a system, denoted by 1 as a whole, includes a pick-up unit 2, for instance a digital camera or a func- 
tionally equivalent unit, such as a conventional camera capable of producing photographs which, after development and 
45 print, may be subjected to a scanning process. Starting from a subject, unit 2 can therefore generate a plane image I of 
the face to be modelled: this image is in practice an image of the type shown in figure 7A. 

[0021] The image I so obtained is in the form of a digitised image, i.e. if a sequence of data that represent pixel by 
pixel the information (brightness, chromatic characteristics, etc.) relating to the same image. 

[0022] Such a sequence of data is provided to a processing system 3 (essentially a computer) which performs - 
50 according to principles well known to a specialist, once the criteria of the embodiment of the invention described in 
detail in the following have been set forth - the operations listed below: 

identification and extraction of the feature points of the image I, designed to be used for processing model M, 
reading from a memory or a similar support 4, associated to the processor, of the data corresponding to the starting 
55 model, which data have been previously stored and are read also in this case according to well known modalities, 
execution of the processing operations typical of the method according to the invention, as better described in the 
sequel, and 

generation of the processed output model, also in this case in the form of digital data representative of the 3-D 
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model; such data can be transferred to and loaded into another processing system (for instance an animation sys- 
tem) and/or downloaded into a storage support 5 (floppy disc, CD-ROM, etc.) for their subsequent use. 

[0023] The operation of adaptation of the starting model M, previously described, to image I is based on a virtual 
optical projection of model M and image I, respectively, performed in a system the focus of which lies in the origin O of 
a three-dimensional Cartesian space x. y, z in which model M is placed in the positive half space along the 2 axis and 
image I is placed in the negative half-space (see the diagram of Figure 4). 

[0024] It will be appreciated that the fine adaptation of model M to image I is based on the assumption that model 
M ,s on the whole oriented, with regard to the plane XY of the above-described system, in a generally mirror-like posi- 
ion with regard to .mage I. Hence, model M is placed with a front orientation, if one requires adaptation to a front image 
I On the contrary model M will be for instance laterally oriented, if it is required to achieve adaptation to a side image 
of the head of the person represented in image I. 

[0025] This also substantially applies to the distance a between origin O and the centre of model M and distance X 
between ong.n O and the plane of image I. To simplify the calibration process and avoid the introduction of unknown 
va ues by the user, at least distance a is set to an arbitrary value (for instance 170 cm), determined in advance by cal- 
cu ating the average of a set of possible cases. It must be still considered that value a depends not only on the distance 

J6Ct fr0m ° amera 2 at * he time When image 1 was taken ' but also on the Parameters of the same camera 
[0026] Substantially, the method according to the invention consists of a series of geometrical transformations 
aimed a making the projection of the set of feature points of the model M of interest coincide with the homologous set 
20 of homologous points identified on image I. 

[0027] Let then (x, -,, Vi Jf Zij ) be the space co-ordinates of the vertex of model M associated to feature point ii (for 
instance the left end of the face) and (X Lj , Y,.,) be the co-ordinates in image I of the same feature point (referred to a 
local system on the plane of image I, with the origin coinciding with the upper angle of the image, in a possible embod- 

25 [0028] After starting the process (step 1 00 in the flow chart of Figure 9), the first operational step (101 in Figure 9) 
is the computation of value X. 

[0029] Let X 0 . Y 0 be the co-ordinates of the centre of the face taken in image I. These co-ordinates are obtained 
by exploiting the four points placed at the end of the face (for instance, with reference to the present release of MPEG- 
4 standard points 10.9 and 10.10: right end and left end of the face, and 11.4, 2.1: top of head and tip of chin) The 
30 following relation will then apply: 

v _ X 10.9 +X 10.10 . v _ Y 11.4 +Y 2 1 

A o 2 ' Y o 2 (!) 

35 

wiS?h 01 1 , h D f iStance * is » m P"««l in s "ch a way as to make the width of the projection of the model coincide with the 
width of the face in the photograph, according to the following relation: 

40 x- Xl0 9 " X ° 

X (II) 

A 10.9 v ' 

[0031] Subsequently (step 1 02) the position of model M along the Y axis is modified so that its projection is verti- 
cally in register with the contents of image I. A value Ay, computed according to relation: 

45 

Ay Zl1 . 4+ Z 2 , " Y 2-1 (HI) 

50 is added to each vertex. 

1 f?K 2 l J" thiS r y m ° del iS SCa,6d vertical, y- After thi * operation, the size of its projection coincides with the area 
of the head reproduced in image I. 

[0033] In a subsequent step 103, each co-ordinate Y of the vertices of model M is multiplied by a coefficient c com- 
puted as follows: 
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c _ Z 2/l -(Y 2/r Y 0 ) 

c — (IV) 
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[0034] At this point (step 104) a global transformation is performed in the vertical direction on the model in order to 
make the position of some characteristic features of the face (for instance, the eyebrows) coincide with those of the per- 
son. The model is substantially altered along the Y axis, as shown in Figure 5. 

[0035] Preferably, the global! transformation is a non-linear transformation, preferably of second order, and most 
5 preferably it is based on a parabolic law, in particular of the type corresponding to a generic parabola (y = az 2 + bz + c ) 
passing in the three points of the plane YZ: 
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[0036] In particular in Figure 5, the model shown in a recumbent position, so in a horizontal direction, corresponds 
to the model before the transformation according to the parabolic function previously described, whereas the model 
shown in a vertical position is the result of said transformation. 
25 [0037] Thereafter (step 1 05, with an essentially cyclic structure, defined by a choice step 1 06, that finds out whether 
the sequence can be considered as complete) a series of transformations (translations, scalings and affine transforms) 
designed to correctly position the individual features characteristic of the face is performed. Preferably the operations 
involved are the following: 

30 - the eyelids and the contour of the eyes are adapted by means of two translations and four affine transforms;, 
the nose is first vertically adapted through scaling and then deformed through two affine transforms; 
the mouth is modified by applying four affine transforms; 

the region between the nose basis and the upper end of the mouth is translated and scaled; and 
the region between the lower end of the mouth and the tip of the chin is translated and scaled. 



[0038] Preferably the adopted affine transforms correspond to a transform that may be set out according to a rela- 
tion of the type: 

x' = c 7 x + c 2 y + c 3 
Y'= c 4 x+ c 5 y + c 6 

where: 



(x/ - x^) (y, - y 2 ) - (x/ - x^) (y, - y 3 ) 
(Yi ~y 2 )(x 1 -x 3 )-(y 1 -YaKx, -Xj) 



c = (Xj - X 2 7 ) (X 1 - X 3 ) - (x/ - X3) (x 1 - 

1,2 (y , - y 2 ) (x, - x 3 ) - (y , - y 3 ) (x, - X2) 
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c 3 = x 1 , -c 1 x 1 -c 2 y 1 



C 4 = 



(yi-y 2 )(x 1 -X 3 )-(y 1 -y 3 )(x 1 -x 2 ) 



= (yi / -y 2 / )(x 1 -x 3 )-(y 1 / -y 3 / )(x 1 -x 2 ) 
(yi-y 2 )(x 1 -x 3 )-(y 1 -y 3 )(x 1 -x 2 ) 



c 6 = yi'-c 4 x 1 - c 5 y, 

[0039] The described formulas express a planar transformation driven by the displacement of three points: 

( X 1. y-i). (x 2 , y2>. ( x 3. V3) ar e the co-ordinates of such points before the transformation, 
(x-T, yi'). (x 2 \ y 2 '). (x 3 ', y 3 ') are the corresponding co-ordinates after the transformation. 

[0040] As the last operations concerning the geometry of the model, two wire frames representing the eyes (sclera 
and ins) are positioned behind the eyelids, so as to allow their closing and to leave sufficient room for a displacement 
simulating the movements of the eyes (step 107). Standard teeth which do not interfere with the movements of the 
mouth (108) are then added to the model. 

[0041] The sequence shown in Figures 6A-6C represents the evolution of model M (here represented according to 
the wire frame mode, to better highlight the variations) with reference to the front appearance of the basic model (Figure 
6A), after the affine transforms (Figure 6B) and after completion with eyes and teeth (Figure 6C). 
[0042] At this point the application of the texture to the model is performed (step 109) by associating to each vertex 
a b.-dimensional co-ordinate that binds it to a specific point of image I, according to a process known as "texture bind- 
ing . The data relating to the texture binding are computed by simply exploiting projections parameters a and X defined 
at the start of the calibration described at the beginning of this description. Teeth have a standard texture defined in 
advance. 

[0043] In the case in which the model is created starting from several images, a further step is performed concern- 
ing the generation of the texture. Such step however is not specifically represented in the flow chart of Figure 9 As a 
matter of fact, the image containing the model texture is created by joining the information associated to the various 
points of sight. 

[0044] Preferably, in order to better exploit the resolution of the image designed to contain the texture, the shape of 
the texture of all the triangles of the model is transformed into a right triangle of a constant size. The triangles so 
obtained are then coupled two by two in order to obtain a rectangular shape. The rectangles are then placed into the 
•mage according to a matrix arrangement so as to cover its surface. The size of the rectangles is a function of the 
number of triangles of the model and of the size of the image that stores the texture of the model 
[0045] Figure 10 shows an example of image containing the texture of the various triangles. Each rectangle (the 
polygons shown are not squares, and are formed by N x N + 1 pixels) contains the texture of two triangles At the 
beginning the texture of the individual triangles has a generic triangle shape that has been transformed into a right tri- 
angle by means of an affine transform and a bi-linear filtering. 

[0046] Figure 1 1 illustrates a detail of the previous Figure 10, showing the actual area of the texture used by two 
triangles inside the rectangle 300. For each rectangle of size N x N + 1 , the effective area is N x N pixels 
[0047] It .s worth noting that this process for texture generation is not specific for the models of human face, but can 
be applied in all the cases of creation of a 3-D model starting from several images. 

IO k° 4 ? • IT 6 m ° del obtained in this wa V ma V be then represented by using different common graphic formats (among 
which, in addition to the MPEG-4 standard previously cited, the standards VRML 2.0 and Openlnventor). All the models 
can be animated so as to reproduce the lip movements and the countenances. In the case in which several images of 
the person, taken from different points of sight, are available, it is possible to apply the method described to the different 
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images so as to enhance the look of the model. The resulting model is obviously oriented according to the orientation 
of the image. 

[0049] It is evident that, while keeping unchanged the invention principles set forth herein, the details of implemen- 
tation and the embodiments can be varied considerably with regard to what has been described and illustrated, without 
5 departing from the scope of this invention, as will be defined in the following claims. 

Claims 

1. Method of creating 3D facial models (M) starting from face images (I), characterised in that it comprises the oper- 
10 ations of: 

providing at least one face image (I); 

providing a 3-D facial model (M) having a vertex structure and comprising a number of surfaces chosen within 
the group formed by: a face surface (V); right eye and left eye (OD, OS) surfaces; upper teeth and lower teeth 
15 (DS, Dl) surfaces; 

choosing respective sets of homologous points among the vertices of the structure of said model (M) and on 
said at least one face image (I); 

modifying the structure of said model (M) so as to make the respective sets of homologous points coincide. 

20 2. Method according to claim 1, characterised in that the said eye surfaces (OD, OS) and teeth surfaces (DS, Dl) are 
chosen in such a way as not to interfere with said face surface (V). 

3. Method according to claim 1 or claim 2, characterised in that the vertices of the structure of said model (M) of the 
respective set are chosen in compliance with the MPEG-4 standard. 

25 

4. Method according to any of the previous claims, characterised in that the modification of the structure of said model 
(M) includes at least one of the operations chosen within the group formed by: 

making the width of the projection of the model (M) coincide with the width of said face image (I), 
30 - vertically registering the projection of the model (M) with said face image (I), 

performing a global, non-linear transformation of the model (M) in the vertical direction in order to make the 
position of at least one characteristic feature of the model (M) coincide with an homologous characteristic fea- 
ture of such face image (I). 

35 5. Method according to claim 4, characterised in that said global transform is performed through a second order func- 
tion, preferably a parabolic function. 

6. Method according to any of the previous claims, characterised in that the modification of the structure of said model 
(M) includes at least one of the following operations: 

40 

adaptation of the eyelid projection and of the eye contours in the model (M) to the homologous regions in the 
face images (I), through at least one operation chosen out of a translation and an affine transform, 
adaptation, in the vertical direction, of the nose through at least one operation chosen out of a scaling and a 
deformation through an affine transform, 
45 - modification of the mouth through at least one affine transform, 

translation and scaling of the region between the nose base and the upper end of the mouth, and 
adaptation of the region between the lower end of the mouth and the chin tip by means of translation and scal- 
ing. 

so 7. Method according to any of the previous claims, characterised in that it includes, as the final operation of the mod- 
ification of said model (M), applying said eye surfaces (OD, OS) and/or teeth surfaces (DS, Dl) close to said face 
surface (V). 

8. Method according to any of the previous claims, characterised in that said modification of the structure of the model 
55 (M) is carried out in the form of a geometric operation performed by positioning said face image (I) and said model 

(M) in opposite and mirroring positions with respect to the origin (O) of a three-dimensional Cartesian system (X, 
Y.Z). 
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Method according to any of the previous claims, characterised in that it additionally comprises the operation of 
applying a respective texture to said modified model. 

Method according to claim 8 and claim 9, characterised in that it includes the operations of computing at least one 
of the parameters chosen within the group including: 

- distance a between said origin (O) and a centre point of said model (M), and 

- distance X between said origin (O) and the plane of said face image (I), 

and of applying sard texture to said modified model (M) through a process of texture binding performed on the basis 
of at least one of said distance parameters. 

Method according to claim 9 or claim 10, characterised in that it includes the operations of: 

- providing a plurality of said face images (I) corresponding to different points of sight of said face 

- creating the texture to be applied to said model (M) by generating, for each of said face images' a respective 
texture information in the form of right triangles of constant size, 

- coupling two by two triangles relating to the texture information derived from a plurality of images so as to 
obtain, as a result of the coupling, respective rectangles, and 

- applying said texture to said modified model in the form of a matrix of said rectangles. 
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